This page last changed on Sep 07, 2011 by wlapka.

This page describes the process to install and configure SAM/Nagios node type from scratch.

IMPORTANT
These are the installation instructions for SAM release (SAM-Update-12).
This is not the latest SAM release.
You can find information about latest release in the Installing SAM-Nagios page.

1. Environment

Disabled selinux in /etc/selinux/config

SELINUX=disabled
You must restart the box if you change this variable

2. Requirements

You need to install host certificate in order to secure the Nagios web portal. Certificate should be placed on the standard location:

ls -l /etc/grid-security/host*
-rw-r--r-- 1 root root 2286 Oct 28 19:26 /etc/grid-security/hostcert.pem
-r-------- 1 root root  887 Oct 28 19:25 /etc/grid-security/hostkey.pem
/etc/grid-security directory must have 755 permission and the certificate must have SSL client attribute
openssl x509 -in /etc/grid-security/hostcert.pem -noout -purpose | grep "SSL client"
SSL client : Yes

If you plan to use the SAM DB (i.e. NCG_TOPOLOGY_USE_SAM or NCG_REMOTE_USE_SAM set to true) you need to request access to SAM PI from your Nagios host. Details on enabling access are maintained by the SAM team here. In the request you should provide the machine address(es) and simply specify that you require access under the "EGEE-SA1 Monitoring Profile".

3. Repositories

Packages:

Manually installed:

Modifications to original repo files:

  • epel.repo
    [epel]
    name=Extra Packages for Enterprise Linux add-ons, no formal support from CERN
    baseurl=http://linuxsoft.cern.ch/epel/5/$basearch
    gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL
    gpgcheck=1
    enabled=0
    protect=0
  • dag.repo
    [dag]
    name=DAG (http://dag.wieers.com) add-on packages, no formal support from CERN
    baseurl=http://linuxsoft.cern.ch/dag/redhat/el5/en/$basearch/dag
    gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-dag
    gpgcheck=1
    enabled=0
    protect=0

SAM repository:

EGI sites should use this repository instead:
[egi-sam]
name=EGI SAM repo
baseurl=http://repository.egi.eu/sw/production/sam/1/$basearch
enabled=1
gpgcheck=0
protect=1
priority=10

Remove the old lcg-CA repository, if installed:

  • rm -f /etc/yum.repos.d/lcg-CA.repo

4. Repository priorities

Install yum-priorities:

yum install yum-priorities

Modify repository files:

glite-UI repo can have higher priority than rpmforge
  • rpmforge.repo
    ### Name: RPMforge RPM Repository for RHEL 5 - dag
    ### URL: http://rpmforge.net/
    [rpmforge]
    name = RHEL $releasever - RPMforge.net - dag
    baseurl = http://apt.sw.be/redhat/el5/en/$basearch/rpmforge
    mirrorlist = http://apt.sw.be/redhat/el5/en/mirrors-rpmforge
    #mirrorlist = file:///etc/yum.repos.d/mirrors-rpmforge
    enabled = 1
    protect = 0
    gpgkey = file:///etc/pki/rpm-gpg/RPM-GPG-KEY-rpmforge-dag
    gpgcheck = 1
    priority=11
    exclude=libyaml
    
    
    [rpmforge-extras]
    name = RHEL $releasever - RPMforge.net - extras
    baseurl = http://apt.sw.be/redhat/el5/en/$basearch/extras
    mirrorlist = http://apt.sw.be/redhat/el5/en/mirrors-rpmforge-extras
    #mirrorlist = file:///etc/yum.repos.d/mirrors-rpmforge-extras
    enabled = 1
    protect = 0
    gpgkey = file:///etc/pki/rpm-gpg/RPM-GPG-KEY-rpmforge-dag
    gpgcheck = 1
    priority=11
    
    
    [rpmforge-testing]
    name = RHEL $releasever - RPMforge.net - testing
    baseurl = http://apt.sw.be/redhat/el5/en/$basearch/testing
    mirrorlist = http://apt.sw.be/redhat/el5/en/mirrors-rpmforge-testing
    #mirrorlist = file:///etc/yum.repos.d/mirrors-rpmforge-testing
    enabled = 0
    protect = 0
    gpgkey = file:///etc/pki/rpm-gpg/RPM-GPG-KEY-rpmforge-dag
    gpgcheck = 1
  • glite-UI.repo
    [glite-UI]
    name=gLite 3.2 User Interface
    baseurl=http://glitesoft.cern.ch/EGEE/gLite/R3.2/glite-UI/sl5/x86_64/RPMS.release/
    gpgkey=http://glite.web.cern.ch/glite/glite_key_gd.asc
    gpgcheck=1
    enabled=1
    priority=16
    
    [glite-UI_updates]
    name=gLite 3.2 User Interface
    baseurl=http://glitesoft.cern.ch/EGEE/gLite/R3.2/glite-UI/sl5/x86_64/RPMS.updates/
    gpgkey=http://glite.web.cern.ch/glite/glite_key_gd.asc
    gpgcheck=1
    enabled=1
    priority=16
    
    [glite-UI_ext]
    name=gLite 3.2 User Interface
    baseurl=http://glitesoft.cern.ch/EGEE/gLite/R3.2/glite-UI/sl5/x86_64/RPMS.externals/
    gpgcheck=0
    enabled=1
    priority=16
EGI sites should use EGI SAM repository described above.
  • sa1-centos5-release.repo
    [egee-sa1]
    name=EGEE Packages from SA1 for CentOS5
    baseurl=http://www.sysadmin.hep.ac.uk/rpms/egee-SA1/centos5/$basearch
    enabled=1
    priority=10
    gpgcheck=0
  • slc5-extras.repo
    [slc5-extras]
    name=Scientific Linux CERN 5 (SLC5) add-on packages, no formal support
    baseurl=http://linuxsoft.cern.ch/cern/slc5X/$basearch/yum/extras/
    gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-cern
           file:///etc/pki/rpm-gpg/RPM-GPG-KEY-jpolok
    gpgcheck=1
    enabled=1
    protect=1
    priority=1
  • slc5-os.repo
    [slc5-os]
    name=Scientific Linux CERN 5 (SLC5) base system packages
    baseurl=http://linuxsoft.cern.ch/cern/slc5X/$basearch/yum/os/
    gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-cern
           file:///etc/pki/rpm-gpg/RPM-GPG-KEY-jpolok
           file:///etc/pki/rpm-gpg/RPM-GPG-KEY-csieh
           file:///etc/pki/rpm-gpg/RPM-GPG-KEY-dawson
    gpgcheck=1
    enabled=1
    protect=1
    priority=1
    exclude=php*,perl-DBI,MySQL-python,c-ares,perl-DBD-MySQL
  • slc5-updates.repo
    [slc5-updates]
    name=Scientific Linux CERN 5 (SLC5) bugfix and security updates
    baseurl=http://linuxsoft.cern.ch/cern/slc5X/$basearch/yum/updates/
    gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-cern
           file:///etc/pki/rpm-gpg/RPM-GPG-KEY-jpolok
           file:///etc/pki/rpm-gpg/RPM-GPG-KEY-csieh
           file:///etc/pki/rpm-gpg/RPM-GPG-KEY-dawson
    gpgcheck=1
    enabled=1
    protect=1
    priority=1
    exclude=php*,perl-DBI,MySQL-python,c-ares,perl-DBD-MySQL

5. Package installation

yum install lcg-CA
yum install httpd
yum install nagios
# make sure that nagios from EGI SAM repository is installed
yum --exclude=\*saga\* --exclude=\*SAGA\* groupinstall 'glite-UI (production - x86_64)'
yum install egee-NAGIOS
ARC probes.
If you plan to use ARC probes to monitor ARC-CE services following additional steps are required.

6. Yaim configuration

The configurations below are updated for Update-08 which requires many less variables.

6.1. NGI instance

EGI NGI/ROC SAM instances should set these variables.

Yaim configuration, for ops VO this time:

# Generic
SITE_NAME=egee.srce.hr
SITE_BDII_HOST=ce1-egee.srce.hr
PX_HOST=se1-egee.srce.hr
BDII_HOST=bdii-egee.srce.hr
VOS="dteam ops"
VO_OPS_VOMS_SERVERS="vomss://voms.cern.ch:8443/voms/ops?/ops/"
VO_OPS_VOMSES="'ops lcg-voms.cern.ch 15009 /DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch ops 24' 'ops voms.cern.ch 15004 /DC=ch/DC=cern/OU=computers/CN=voms.cern.ch ops 24'"
VO_OPS_VOMS_CA_DN="'/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Trusted Certification Authority'"

VO_DTEAM_VOMS_SERVERS='vomss://voms.hellasgrid.gr:8443/voms/dteam?/dteam/'
VO_DTEAM_VOMSES="'dteam voms.hellasgrid.gr 15004 /C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms.hellasgrid.gr dteam 24' 'dteam voms2.hellasgrid.gr 15004 /C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms2.hellasgrid.gr dteam 24'"
VO_DTEAM_VOMS_CA_DN="'/C=GR/O=HellasGrid/OU=Certification Authorities/CN=HellasGrid CA 2006' '/C=GR/O=HellasGrid/OU=Certification Authorities/CN=HellasGrid CA 2006'"

RB_HOST=skurut2.cesnet.cz # irelevant, RB is unsupported
VO_DTEAM_WMS_HOSTS="wms204.cern.ch wms205.cern.ch" # put to your NGI WMSes
VO_OPS_WMS_HOSTS="wms204.cern.ch wms205.cern.ch" # put to your NGI WMSes

# Nagios
NAGIOS_HOST=nagiosdev001.cern.ch
NAGIOS_ADMIN_DNS="/C=HR/O=edu/OU=srce/CN=Emir Imamagic"
NCG_NAGIOS_ADMIN=eimamagi@srce.hr
NAGIOS_ROLE=ngi
NCG_PROBES_TYPE=local
NCG_VO=ops
NAGIOS_HTTPD_ENABLE_CONFIG=true
NAGIOS_NCG_ENABLE_CONFIG=true
NAGIOS_SUDO_ENABLE_CONFIG=true
NAGIOS_NAGIOS_ENABLE_CONFIG=true
NAGIOS_CGI_ENABLE_CONFIG=true
NAGIOS_NSCA_PASS=MY_PASS

# NGI/ROC Nagios
COUNTRY_NAME=Croatia
NAGIOS_NCG_ENABLE_CRON=true
NCG_GOCDB_ROC_NAME=NGI_HR
NCG_TOPOLOGY_USE_GOCDB=false
NCG_TOPOLOGY_USE_ENOC=false
NCG_TOPOLOGY_USE_LDAP=false
NCG_REMOTE_USE_SAM=false
NCG_REMOTE_USE_NAGIOS=false
NCG_REMOTE_USE_ENOC=false
NCG_TOPOLOGY_USE_SAM=false
NCG_TOPOLOGY_USE_ATP=true
NCG_TOPOLOGY_ATP_ROOT_URL="http://grid-monitoring.cern.ch/atp"
NAGIOS_SUDO_ENABLE_CONFIG=true

# DB data
MYSQL_ADMIN="MY_MYSQL_PASS"
DB_PASS="MY_MRS_PASS"

MYEGI_ADMIN_NAME="Admin Name"
MYEGI_ADMIN_EMAIL="admin@address.hr"
MYEGI_DEFAULT_PROFILE="ROC"
MYEGI_REGION="NGI_HR"
NCG_MDDB_SUPPORTED_PROFILES="ROC,ROC_CRITICAL,ROC_OPERATORS,GLEXEC"
If you want to speed up the process remove package gcc from the system.

Run Yaim:

/opt/glite/yaim/bin/yaim -s site-info.def -c -n glite-UI -n glite-NAGIOS

On UI box with your dteam credential run:

myproxy-init -l nagios -s se1-egee.srce.hr -k NagiosRetrieve-nagiosdev001.cern.ch-dteam -c 336 -x -Z "/DC=ch/DC=cern/OU=computers/CN=nagiosdev001.cern.ch"

6.2. Site instance, remote-only probes

This configuration should be used only for deploying site level SAM instance.

Yaim configuration:

# Generic
SITE_NAME=egee.srce.hr
SITE_BDII_HOST=ce1-egee.srce.hr
PX_HOST=se1-egee.srce.hr
BDII_HOST=bdii-egee.srce.hr
VOS="dteam ops"
VO_DTEAM_VOMS_SERVERS="'vomss://voms.hellasgrid.gr:8443/voms/dteam?/dteam/'"
VO_OPS_VOMS_SERVERS="vomss://voms.cern.ch:8443/voms/ops?/ops/"

# Nagios
NAGIOS_HOST=nagiosdev001.cern.ch
NAGIOS_ADMIN_DNS="/C=HR/O=edu/OU=srce/CN=Emir Imamagic"
NCG_NAGIOS_ADMIN=eimamagi@srce.hr
NAGIOS_ROLE=site
NCG_PROBES_TYPE=remote
NCG_VO=dteam
NAGIOS_HTTPD_ENABLE_CONFIG=true
NAGIOS_NCG_ENABLE_CONFIG=true
NAGIOS_SUDO_ENABLE_CONFIG=true
NAGIOS_NAGIOS_ENABLE_CONFIG=true
NAGIOS_CGI_ENABLE_CONFIG=true
NCG_REMOTE_USE_NAGIOS=true
NAGIOS_NSCA_PASS=MY_PASS

Run Yaim:

/opt/glite/yaim/bin/yaim -s site-info.def -c -n glite-NAGIOS

6.3. Site instance, all probes

This configuration should be used only for deploying site level SAM instance.

Yaim configuration:

# Generic
SITE_NAME=egee.srce.hr
SITE_BDII_HOST=ce1-egee.srce.hr
PX_HOST=se1-egee.srce.hr
BDII_HOST=bdii-egee.srce.hr
VOS="dteam ops"
VO_OPS_VOMS_SERVERS="vomss://voms.cern.ch:8443/voms/ops?/ops/"
VO_OPS_VOMSES="'ops lcg-voms.cern.ch 15009 /DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch ops 24' 'ops voms.cern.ch 15004 /DC=ch/DC=cern/OU=computers/CN=voms.cern.ch ops 24'"
VO_OPS_VOMS_CA_DN="'/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Trusted Certification Authority'"VO_OPS_VOMS_CA_DN="'/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Trusted Certification Authority'"

VO_DTEAM_VOMS_SERVERS='vomss://voms.hellasgrid.gr:8443/voms/dteam?/dteam/'
VO_DTEAM_VOMSES="'dteam voms.hellasgrid.gr 15004 /C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms.hellasgrid.gr dteam 24' 'dteam voms2.hellasgrid.gr 15004 /C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms2.hellasgrid.gr dteam 24'"
VO_DTEAM_VOMS_CA_DN="'/C=GR/O=HellasGrid/OU=Certification Authorities/CN=HellasGrid CA 2006' '/C=GR/O=HellasGrid/OU=Certification Authorities/CN=HellasGrid CA 2006'"

RB_HOST=skurut2.cesnet.cz
VO_DTEAM_WMS_HOSTS="wms204.cern.ch wms205.cern.ch"

# Nagios
NAGIOS_HOST=nagiosdev001.cern.ch
NAGIOS_ADMIN_DNS="/C=HR/O=edu/OU=srce/CN=Emir Imamagic"
NCG_NAGIOS_ADMIN=eimamagi@srce.hr
NAGIOS_ROLE=site
NCG_PROBES_TYPE=remote,local
NCG_VO=dteam
NAGIOS_HTTPD_ENABLE_CONFIG=true
NAGIOS_NCG_ENABLE_CONFIG=true
NAGIOS_SUDO_ENABLE_CONFIG=true
NAGIOS_NAGIOS_ENABLE_CONFIG=true
NAGIOS_CGI_ENABLE_CONFIG=true
NCG_REMOTE_USE_NAGIOS=true
NAGIOS_NSCA_PASS=MY_PASS
If you want to speed up the process remove package gcc from the system.

Run Yaim:

/opt/glite/yaim/bin/yaim -s site-info.def -c -n glite-UI -n glite-NAGIOS

On UI box with your dteam credential run:

myproxy-init -l nagios -s se1-egee.srce.hr -k NagiosRetrieve-nagiosdev001.cern.ch-dteam -c 336 -x -Z "/DC=ch/DC=cern/OU=computers/CN=nagiosdev001.cern.ch"

6.4. VO feed instance

This configuration can be used only for VOs which provide VO feeds to ATP.
List can be found here

Yaim configuration, for ops VO this time:

# Generic
SITE_NAME=egee.srce.hr
SITE_BDII_HOST=ce1-egee.srce.hr
PX_HOST=se1-egee.srce.hr
BDII_HOST=bdii-egee.srce.hr
RB_HOST=skurut2.cesnet.cz # irelevant, RB is unsupported

VOS="dteam ops cms"
VO_OPS_VOMS_SERVERS="vomss://voms.cern.ch:8443/voms/ops?/ops/"
VO_CMS_VOMS_SERVERS="'vomss://voms.cern.ch:8443/voms/cms?/cms/'"
VO_OPS_VOMSES="'ops lcg-voms.cern.ch 15009 /DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch ops 24' 'ops voms.cern.ch 15004 /DC=ch/DC=cern/OU=computers/CN=voms.cern.ch ops 24'"
VO_CMS_VOMSES="'cms lcg-voms.cern.ch 15002 /DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch cms 24' 'cms voms.cern.ch 15002 /DC=ch/DC=cern/OU=computers/CN=voms.cern.ch cms 24'"
VO_OPS_VOMS_CA_DN="'/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Trusted Certification Authority'"
VO_CMS_VOMS_CA_DN="'/DC=ch/DC=cern/CN=CERN Trusted Certification Authority' '/DC=ch/DC=cern/CN=CERN Trusted Certification Authority'"

VO_DTEAM_VOMS_SERVERS='vomss://voms.hellasgrid.gr:8443/voms/dteam?/dteam/'
VO_DTEAM_VOMSES="'dteam voms.hellasgrid.gr 15004 /C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms.hellasgrid.gr dteam 24' 'dteam voms2.hellasgrid.gr 15004 /C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms2.hellasgrid.gr dteam 24'"
VO_DTEAM_VOMS_CA_DN="'/C=GR/O=HellasGrid/OU=Certification Authorities/CN=HellasGrid CA 2006' '/C=GR/O=HellasGrid/OU=Certification Authorities/CN=HellasGrid CA 2006'"

VO_DTEAM_WMS_HOSTS="wms204.cern.ch wms205.cern.ch" # put to your NGI WMSes
VO_OPS_WMS_HOSTS="wms204.cern.ch wms205.cern.ch" # put to your NGI WMSes
VO_CMS_WMS_HOSTS="wms204.cern.ch wms205.cern.ch" # put to your NGI WMSes

# Nagios
NAGIOS_HOST=nagiosdev001.cern.ch
NAGIOS_ADMIN_DNS="/C=HR/O=edu/OU=srce/CN=Emir Imamagic"
NCG_NAGIOS_ADMIN=eimamagi@srce.hr
NAGIOS_ROLE=vo
NCG_PROBES_TYPE=local
NCG_VO=cms
NAGIOS_HTTPD_ENABLE_CONFIG=true
NAGIOS_NCG_ENABLE_CONFIG=true
NAGIOS_SUDO_ENABLE_CONFIG=true
NAGIOS_NAGIOS_ENABLE_CONFIG=true
NAGIOS_CGI_ENABLE_CONFIG=true
NAGIOS_NSCA_PASS=MY_PASS

# VO Nagios
NAGIOS_NCG_ENABLE_CRON=true
NCG_TOPOLOGY_USE_SAM=false
NCG_TOPOLOGY_USE_GOCDB=false
NCG_TOPOLOGY_USE_ENOC=false
NCG_TOPOLOGY_USE_LDAP=false
NCG_REMOTE_USE_SAM=false
NCG_REMOTE_USE_NAGIOS=false
NCG_REMOTE_USE_ENOC=false

NCG_USE_ATP_VO_FEED=true
NCG_TOPOLOGY_ATP_ROOT_URL="http://grid-monitoring.cern.ch/atp"

# DB data
MYSQL_ADMIN="MY_MYSQL_PASS"
DB_PASS="MY_MRS_PASS"

MYEGI_ADMIN_NAME="Admin Name"
MYEGI_ADMIN_EMAIL="admin@address.hr"
MYEGI_DEFAULT_PROFILE="ROC"
MYEGI_REGION="NGI_HR"
NCG_MDDB_SUPPORTED_PROFILES="ROC,ROC_CRITICAL,ROC_OPERATORS"
If you want to speed up the process remove package gcc from the system.

Run Yaim:

/opt/glite/yaim/bin/yaim -s site-info.def -c -n glite-UI -n glite-NAGIOS

On UI box with your dteam credential run:

myproxy-init -l nagios -s se1-egee.srce.hr -k NagiosRetrieve-nagiosdev001.cern.ch-dteam -c 336 -x -Z "/DC=ch/DC=cern/OU=computers/CN=nagiosdev001.cern.ch"

6.5. VO instance

Installation notes of VO instance were kindly provided by Gonçalo Borges (NGI_IBERGRID): https://wiki.egi.eu/wiki/VO_Services/VO_Service_Availability_Monitoring.

7. Additional configuration

7.1. Robot certificates

Starting from Update-09 SAM supports usage of robot certificates, instead of MyProxy credentials. If your CA supports robot certificates, we suggest switching to robot certificates, as they are easier to maintain. Also robots provide better availability as SAM doesn't depend on availability of MyProxy servier.

In order to use robot certificates set the following YAIM variables:

NCG_USE_ROBOT_CERT=true
# Robot cert and key can be different for each VO
# and standard Yaim VO notation is used
VO_OPS_ROBOT_CERT=/etc/nagios/globus/robot-cert.pem
VO_OPS_ROBOT_KEY=/etc/nagios/globus/robot-key.pem
VO_DTEAM_ROBOT_CERT=/etc/nagios/globus/robot-cert.pem-dteam
VO_DTEAM_ROBOT_KEY=/etc/nagios/globus/robot-key.pem-dteam
Variables ROBOT_CERT and ROBOT_KEY use standard Yaim VO notation. If VO directories (vo.d/voname) are used, variables should put in appropriate VO files.

7.2. ACE support in MyEGI

Currently it's only for the central MyEGI instance. YAIM configuration:

MYEGI_ACE=true

7.3 Monitoring gLExec services

Service gLExec requires pilot role in the VOMS proxy certificate. In order to monitor gLExec services make sure that you have permission for the pilot role in your VO.

In the Yaim configuration set the following variables:

NCG_HASH_CONFIG_PROFILES=<role_name>,GLEXEC
NCG_PROFILE_FQAN_GLEXEC=/<vo_name>/Role=pilot

where <role_name> is name of your role in capital letters.

Correct setting for NGI instances:

NCG_HASH_CONFIG_PROFILES=NGI,GLEXEC
NCG_PROFILE_FQAN_GLEXEC=/ops/Role=pilot

7.4 Setting alternative SE for metric org.sam.WN-RepRep

Starting from the release Update-07, it is possible to specify more than one replication SE for WN replica test org.sam.WN-RepRep. Static and/or dynamic mechanisms are possible.

In order to define static list of comma-separated hostnames set the following Yaim variable:

JOBSUBMIT_WN_SE_REP=se1[,se2,se3...]

Dynamic list is filled with a list of SEs defined on the Nagios instance that recently successfully passed org.sam.SRM-All set of tests. In order to use dynamic list set the following Yaim variable:

JOBSUBMIT_WN_SE_REP_FILE=filename

Filename must be defined without path.
If the dynamic list is used metric hr.srce.GoodSEs will be associated to Nagios host. The hr.srce.GoodSEs metric generates the list of "good" SEs, as well as provides the file as input parameter to org.sam.(CREAM)CE-JobState metric(s).

The org.sam.(CREAM)CE-JobState metric(s) takes up to max 3 hosts from the file and, if JOBSUBMIT_WN_SE_REP was defined, appends them to the static list. On WN, org.sam.WN-RepRep tries to replicate to all the SEs in the provided order until the replication succeeds. The metric returns CRITICAL, if file couldn't be replicated to any for the SEs. This fixes https://tomtools.cern.ch/jira/browse/SAM-442.

7.5 Setting alternative BDII for metric org.sam.SRM-All

Metric org.sam.SRM-All uses sam-bdii.cern.ch top BDII by default. In order to make tests less dependent on CERN top BDII it is suggested to set alternative BDII.

In order to set alternative BDII create localdb file (e.g. /etc/ncg/ncg-localdb.d/srm.conf). There are two options:
1. switch to your own top BDII:

MODIFY_METRIC_PARAMETER!org.sam.SRM-All!--ldap-uri!your.top.bdii

2. start using site BDIIs:

MODIFY_METRIC_ATTRIBUTE!org.sam.SRM-All!SITE_BDII!--ldap-uri

7.6 Setting alternative LFC for metrics org.sam.WN-Rep*

Metrics org.sam.WN-Rep* use prod-lfc-shared-central.cern.ch LFC by default.

In order to set alternative lfc create localdb file (e.g. /etc/ncg/ncg-localdb.d/LFC.conf):

MODIFY_METRIC_PARAMETER!org.sam.CREAMCE-JobState!--wn-lfc!lfc.my.domain
MODIFY_METRIC_PARAMETER!org.sam.CE-JobState!--wn-lfc!lfc.my.domain

7.7 Monitoring Globus services

Globus services currently do not support VOs. In order to monitor Globus services SAM administrator has to contact all sites and request to add the certificate DN to the grid-mapfile.

8. Installation validation

After successful running of Yaim you should be able to access Nagios web interface at the address https://NAGIOS_SERVER/nagios.

If you enabled local probes make sure that you first check if MyProxy credential works by running hr.srce.GridProxy-Get-VO metric on NAGIOS_SERVER. You can do this by force scheduling check via web interface or via command line:

nagios-run-check NAGIOS_SERVER hr.srce.GridProxy-Get-VO

MyEGI interface is at the address: https://NAGIOS_SERVER/myegi.

Check resource BDII:

ldapsearch -x -LLL -h NAGIOS_SERVER -p 2170 -b Mds-Vo-Name=resource,O=grid "(GlueServiceType=*-NAGIOS)" GlueServiceEndpoint
dn: GlueServiceUniqueID=NAGIOS_SERVER_XXXXXX-NAGIOS_2937827985,Mds-Vo-name=
 resource,o=grid
GlueServiceEndpoint: https://NAGIOS_SERVER:443/nagios

9. Known Issues

For machines running latest version of glite-UI (3.2.10-1):

Please restart Nagios after yaim execution. Otherwise you may see problems similar to SAM-1693.

service nagios restart
In new installations please add following line to file /etc/my.cnf and restart mysql:
[mysqld]
event-scheduler=1
When using yum to upgrade a machine from Update-11 to Update-12 the following exclude option is required for the yum command:

For a Nagios (egee-NAGIOS) machine:

yum --exclude=egee-NAGIOS-WEB update

For a gridmon (egee-NAGIOS-WEB) machine:

yum --exclude=egee-NAGIOS update
If the monitoring infrastructure contains WMS service and no CE services, metric hr.srce.GoodCEs associated to Nagios service will fail with the following error:
HealthyNodes CRITICAL - No healthy hosts found.

In order to fix it create file /etc/ncg/ncg-localdb.d/GoodCEs-fix with the following content:

MODIFY_METRIC_PARAMETER!hr.srce.GoodCEs!--metric!org.sam.CREAMCE-JobSubmit

and restart service ncg:

service ncg restart

10. Problems

A description of common problems when installing SAM can be found at the Troubleshooting section.

Document generated by Confluence on Feb 27, 2014 10:19